Generation of High Quality Audio Natural Emotional Speech Corpus using Task Based Mood Induction

نویسندگان

  • C. Cullen
  • B. Vaughan
  • S. Kousidis
  • Wang Yi
  • C. McDonnell
چکیده

Detecting emotional dimensions [1] in speech is an area of great research interest, notably as a means of improving human computer interaction in areas such as speech synthesis [2]. In this paper, a method of obtaining high quality emotional audio speech assets is proposed. The methods of obtaining emotional content are subject to considerable debate, with distinctions between acted [3] and natural [4] speech being made based on the grounds of authenticity. Mood Induction Procedures (MIP’s) [5] are often employed to stimulate emotional dimensions in a controlled environment. This paper details experimental procedures based around MIP 4, using performance related tasks to engender activation and evaluation responses from the participant. Tasks are specified involving two participants, who must co-operate in order to complete a given task [6] within the allotted time. Experiments designed in this manner also allow for the specification of high quality audio assets (notably 24bit/192Khz [7]), within an acoustically controlled environment [8], thus providing means of reducing unwanted acoustic factors within the recorded speech signal. Once suitable assets are obtained, they will be assessed for the purposes of segregation into differing emotional dimensions. The most statistically robust method of evaluation involves the use of listening tests to determine the perceived emotional dimensions within an audio clip. In this experiment, the FeelTrace [9] rating tool is employed within user listening tests to specify the categories of emotional dimensions for each audio clip.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Emotional Speech Corpus Creation, Structure, Distribution and Re-Use

This paper details the on-going creation of a natural emotional speech corpus, its structure, distribution, and re-use. Using Mood Induction Procedures (MIPs), high quality emotional speech assets are obtained, analysed, tagged (for acoustic features), annotated and uploaded to an online speech corpus. This method structures the corpus in a logical and coherent manner, allowing it to be utilize...

متن کامل

Using Signal Detection Theory to Investigate the Impact of Mood Induction on Emotional Information Processing in High BAS/BIS Individuals

Objective: The main objective of this study was to investigate the explicit memory bias in the people with high BAS/BIS sensitivity in the different manipulated mood states.  Methods: By using purposive sampling method, seventy-four participants (undergraduate students) were selected based on z-scores of 480 using the Carver and White’s BAS/BIS scale. They were distributed as: 24 wi...

متن کامل

Emotional Speech Synthesis with Corpus-Based Generation of F0 Contours Using Generation Process Model

A method was developed for the corpus-based synthesis of emotional speech. Fundamental frequency (F0) contours were synthesized by predicting command values of the generation process model using binary regression trees with the input of linguistic information of the sentence to be synthesized. Because of the model constraint, a certain quality is still kept in synthesized speech even if the pre...

متن کامل

Improvement in corpus-based generation of F0 contours using generation process model for emotional speech synthesis

In our fully automatic corpus-based method of generating fundamental frequency (F0) contours for emotional speech synthesis, an improvement was realized related to the process of corpus preparation. The method assumes the generation process model and predicts its command parameters using binary regression trees with inputs of linguistic information of the sentence to be synthesized. Because of ...

متن کامل

Corpus-based Synthesis of F0 Conto Using the Generation P

A corpus-based generation of fundamental frequency (F0) contours was realized for emotional speech synthesis. The method, originally developed for read speech, is to predict command values of the F0 contour generation process model with the input of linguistic information of the sentence to be synthesized. Since the generated F0 contour is under the model constraint, a certain quality is still ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006